160 PART 4 Comparing Groups

Data from two potentially associated categorical variables is summarized as a

cross-tabulation, which is also called a cross-tab or a two-way table. Because we are

studying the association between two variables, this is a form of bivariate analysis.

The rows of the cross-tab represent the different categories (or levels) of one vari-

able, and the columns represent the different levels of the other variable. The cells

of the table contain the count of the number of participants with the indicated

levels for the row and column variables. If one variable can be thought of as the

“cause” or “predictor” of the other, the cause variable becomes the rows, and the

“outcome” or “effect” variable becomes the columns. If the cause and outcome

variables are both dichotomous, meaning they have only two levels (like in this

example), then the cross-tab has two rows and two columns. This structure con-

tains four cells containing counts, and is referred to as a 2-by-2 (or 2 × 2) cross-

tab, or a fourfold table. Cross-tabs are displayed with an extra row at the bottom

and an extra column at the right to contain the sums of the cells in the rows and

columns of the table. These sums are called marginal totals, or just marginals.

Comparing proportions based on a fourfold table is the simplest example of test-

ing the association between two categorical variables. More generally, the vari-

ables can have any number of categories, so the cross-tab can be larger than

2 × 2, with multiple rows and many columns. But the basic question to be answered

is always the same: Is the spread of numbers across the columns so different from one

row to the next that the numbers can’t be explained away as random fluctuations?

Another way of asking the same question is: Is being a member of a particular row

associated with being a member of a particular column?

In this chapter, we describe two tests you can use to answer this question: the

Pearson chi-square test, and the Fisher Exact test. We also explain how to esti-

mate power and sample sizes for the chi-square and Fisher Exact tests.

Like with other statistical tests, you can run all the tests in this chapter from

individual-level data in a database, where there is one record per participant. But

the tests in this chapter can also be executed using data that has already been

summarized in the form of a cross-tab:»

» Most statistical software is set up to work with individual-level data. In that

case, your data file needs to have two columns for the association you want to

test: one containing the categorical variable representing the treatment group

(or whatever category is on the y-axis), and one containing the categorical

variable representing the outcome. If you have the correct columns, all you

have to do is tell the statistical software you are using which test or tests you

want to run, and which variables to use in the test.